Title¶

Using a K-NN Classification Model to Predict the Genre of a given Song based on Danceability and Energy

Introduction¶

Today, listening to music has been more accessible than ever. Popular streaming platforms like Spotify make it easy for users to discover new music genres and receive recommendations aligned with their music preferences (Ignatius Moses Setiadi et al., 2020). Music recommendations play a crucial role in helping users find songs specifically tailored to their tastes, which often involves the process of classifying music genres via a variety of classifiers (Ignatius Moses Setiadi et al., 2020). The enjoyment of a song can depend on various factors, such as emotional impact, catchy melodies, or impactful lyrics (Khan et al., 2022). Additionally, audio features like loudness, tempo or energy can be used to classify a song’s genre, and are often used by music streaming platforms to recommend new songs to their users (Khan et al., 2022).

Based on this information, the question we want to answer with our project is: “What is the genre of a given song based on its danceability and energy values?” This is a classification question, which uses one or more variables to predict the value of a categorical variable of interest. We will be using the K-nearest neighbors algorithm to predict the genre for our chosen song. KNN is used to predict the correct class for the test data by calculating the Euclidean distance between the test data and all the training points (Taunk et al., 2019). The test data is assigned to the class that corresponds to its K nearest neighbors, with ‘K’ being the number of neighbors that must be considered (Taunk et al., 2019). The best value of K depends on the dataset and is not always the largest value, because other undesired points may get included in the neighborhood and blur the classification boundaries (Taunk et al., 2019). The dataset we will be using is “Dataset of songs in Spotify'' from Kaggle. This dataset has 22 columns titled: danceability, energy, key, loudness, mode, speechless, acousticness, instrumentalness, liveness, valence, tempo, type, id, uri, track_href, analysis_url, duration_ms, time_signature, and song_name. The full list of genres includes Trap, Techno, Techhouse, Trance, Psytrance, Dark Trap, DnB (drums and bass), Hardstyle, Underground Rap, Trap Metal, Emo, Rap, RnB, Pop and Hiphop. We will be using danceability (from 0-0.99), energy (from 0-1) and and the genres: Emo, Hardstyle, and Hiphop in our project.

Preliminary exploratory data analysis¶

In [1]:
library(readr)
library(repr)
library(tidyverse)
library(tidymodels)
library(ggplot2)
options(repr.matrix.max.rows = 10)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.3     ✔ purrr     1.0.2
✔ forcats   1.0.0     ✔ stringr   1.5.0
✔ ggplot2   3.4.3     ✔ tibble    3.2.1
✔ lubridate 1.9.2     ✔ tidyr     1.3.0
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
── Attaching packages ────────────────────────────────────── tidymodels 1.1.1 ──

✔ broom        1.0.5     ✔ rsample      1.2.0
✔ dials        1.2.0     ✔ tune         1.1.2
✔ infer        1.0.4     ✔ workflows    1.1.3
✔ modeldata    1.2.0     ✔ workflowsets 1.0.1
✔ parsnip      1.1.1     ✔ yardstick    1.2.0
✔ recipes      1.0.8     

── Conflicts ───────────────────────────────────────── tidymodels_conflicts() ──
✖ scales::discard() masks purrr::discard()
✖ dplyr::filter()   masks stats::filter()
✖ recipes::fixed()  masks stringr::fixed()
✖ dplyr::lag()      masks stats::lag()
✖ yardstick::spec() masks readr::spec()
✖ recipes::step()   masks stats::step()
• Use suppressPackageStartupMessages() to eliminate package startup messages

In [2]:
urlfile="https://raw.githubusercontent.com/brandonzchen/GroupProjDSCI/main/genres_v2.csv"

mydata<-read_csv(url(urlfile))
Rows: 42305 Columns: 22
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (8): type, id, uri, track_href, analysis_url, genre, song_name, title
dbl (14): danceability, energy, key, loudness, mode, speechiness, acousticne...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
In [3]:
#This is the code for a summary of the information of the data

datainformation <- mydata |>
    select(danceability, energy, genre) |>
    filter(genre == "Emo" | genre == "hardstyle" | genre == "Hiphop") |>
    group_by(genre) |>
    summarise(count = n(),
              mean_energy = mean(energy),
              mean_danceability = mean(danceability))

datainformation
A tibble: 3 × 4
genrecountmean_energymean_danceability
<chr><int><dbl><dbl>
Emo 16800.76117500.4936988
Hiphop 30280.65441790.6989818
hardstyle29360.89623840.4780270
In [4]:
song_data <- mydata |>
    select(danceability, energy, genre) |>
    filter(genre == "Emo" | genre == "hardstyle" | genre == "Hiphop") |>
    mutate(genre = as_factor(genre)) |>
    drop_na()

genre_plot <- song_data |>
    ggplot(aes(x = energy, y = danceability)) + 
        geom_point(alpha = 0.4, aes(colour = genre)) +
        ggtitle("Figure 1: Scattorplot of the Genres based on Energy and Danceability") +
        xlab("Energy") +
        ylab("Danceability") +
        labs(colour = "Genre") +
        theme(text = element_text(size = 18))
options(repr.plot.width = 10, repr.plot.height = 8)
genre_plot
<symbol overflow="visible" id="glyph0-0"> </symbol> <symbol overflow="visible" id="glyph0-1"> </symbol> <symbol overflow="visible" id="glyph0-2"> </symbol> <symbol overflow="visible" id="glyph0-3"> </symbol> <symbol overflow="visible" id="glyph0-4"> </symbol> <symbol overflow="visible" id="glyph0-5"> </symbol> <symbol overflow="visible" id="glyph0-6"> </symbol> <symbol overflow="visible" id="glyph0-7"> </symbol> <symbol overflow="visible" id="glyph0-8"> </symbol> <symbol overflow="visible" id="glyph0-9"> </symbol> <symbol overflow="visible" id="glyph0-10"> </symbol> <symbol overflow="visible" id="glyph0-11"> </symbol> <symbol overflow="visible" id="glyph0-12"> </symbol> <symbol overflow="visible" id="glyph0-13"> </symbol> <symbol overflow="visible" id="glyph0-14"> </symbol> <symbol overflow="visible" id="glyph0-15"> </symbol> <symbol overflow="visible" id="glyph0-16"> </symbol> <symbol overflow="visible" id="glyph0-17"> </symbol> <symbol overflow="visible" id="glyph0-18"> </symbol> <symbol overflow="visible" id="glyph0-19"> </symbol> <symbol overflow="visible" id="glyph0-20"> </symbol> <symbol overflow="visible" id="glyph0-21"> </symbol> <symbol overflow="visible" id="glyph1-0"> </symbol> <symbol overflow="visible" id="glyph1-1"> </symbol> <symbol overflow="visible" id="glyph1-2"> </symbol> <symbol overflow="visible" id="glyph1-3"> </symbol> <symbol overflow="visible" id="glyph1-4"> </symbol> <symbol overflow="visible" id="glyph1-5"> </symbol> <symbol overflow="visible" id="glyph1-6"> </symbol> <symbol overflow="visible" id="glyph1-7"> </symbol> <symbol overflow="visible" id="glyph2-0"> </symbol> <symbol overflow="visible" id="glyph2-1"> </symbol> <symbol overflow="visible" id="glyph2-2"> </symbol> <symbol overflow="visible" id="glyph2-3"> </symbol> <symbol overflow="visible" id="glyph2-4"> </symbol> <symbol overflow="visible" id="glyph2-5"> </symbol> <symbol overflow="visible" id="glyph2-6"> </symbol> <symbol overflow="visible" id="glyph2-7"> </symbol> <symbol overflow="visible" id="glyph2-8"> </symbol> <symbol overflow="visible" id="glyph2-9"> </symbol> <symbol overflow="visible" id="glyph2-10"> </symbol> <symbol overflow="visible" id="glyph3-0"> </symbol> <symbol overflow="visible" id="glyph3-1"> </symbol> <symbol overflow="visible" id="glyph3-2"> </symbol> <symbol overflow="visible" id="glyph3-3"> </symbol> <symbol overflow="visible" id="glyph3-4"> </symbol> <symbol overflow="visible" id="glyph3-5"> </symbol> <symbol overflow="visible" id="glyph3-6"> </symbol> <symbol overflow="visible" id="glyph3-7"> </symbol> <symbol overflow="visible" id="glyph3-8"> </symbol> <symbol overflow="visible" id="glyph3-9"> </symbol> <symbol overflow="visible" id="glyph3-10"> </symbol> <symbol overflow="visible" id="glyph3-11"> </symbol> <symbol overflow="visible" id="glyph3-12"> </symbol> <symbol overflow="visible" id="glyph3-13"> </symbol> <symbol overflow="visible" id="glyph3-14"> </symbol> <symbol overflow="visible" id="glyph3-15"> </symbol> <symbol overflow="visible" id="glyph3-16"> </symbol> <symbol overflow="visible" id="glyph3-17"> </symbol> <symbol overflow="visible" id="glyph3-18"> </symbol> <symbol overflow="visible" id="glyph3-19"> </symbol> <symbol overflow="visible" id="glyph3-20"> </symbol> <symbol overflow="visible" id="glyph3-21"> </symbol> <symbol overflow="visible" id="glyph3-22"> </symbol> <symbol overflow="visible" id="glyph3-23"> </symbol> <symbol overflow="visible" id="glyph3-24"> </symbol> <symbol overflow="visible" id="glyph3-25"> </symbol> <symbol overflow="visible" id="glyph3-26"> </symbol>
In [5]:
set.seed(2023)

song_split <- initial_split(song_data, prop = 0.75, strata = genre)
song_train <- training(song_split)
song_test <- testing(song_split)

knn_recipe <- recipe(genre ~ energy + danceability, data = song_train) |>
    step_scale(all_predictors()) |>
    step_center(all_predictors())

knn_spec <- nearest_neighbor(weight_func = "rectangular", neighbors = tune()) |>
    set_engine("kknn") |>
    set_mode("classification")

knn_vfold <- vfold_cv(song_train, v = 5, strata = genre)

k_vals <- tibble(neighbors = seq(from = 75, to = 100, by = 5))

knn_results <- workflow() |>
    add_recipe(knn_recipe) |>
    add_model(knn_spec) |>
    tune_grid(resamples = knn_vfold, grid = k_vals) |>
    collect_metrics()

accuracies <- knn_results |>
    filter(.metric == "accuracy")

k_vs_accuracy_plot <- accuracies |>
    ggplot(aes(x = neighbors, y = mean)) +
    geom_point() +
    geom_line() +
    labs(x = "Neighbors", y = "Estimated Accuracy") +
    ggtitle("Figure 2: Plot of Number of Neighbours vs Estimated Accuracy") +
    theme(text = element_text(size = 15)) +
    scale_x_continuous(breaks = seq(75, 100, by = 5))
options(repr.plot.width = 10, repr.plot.height = 8)
k_vs_accuracy_plot
<symbol overflow="visible" id="glyph0-0"> </symbol> <symbol overflow="visible" id="glyph0-1"> </symbol> <symbol overflow="visible" id="glyph0-2"> </symbol> <symbol overflow="visible" id="glyph0-3"> </symbol> <symbol overflow="visible" id="glyph0-4"> </symbol> <symbol overflow="visible" id="glyph0-5"> </symbol> <symbol overflow="visible" id="glyph0-6"> </symbol> <symbol overflow="visible" id="glyph0-7"> </symbol> <symbol overflow="visible" id="glyph0-8"> </symbol> <symbol overflow="visible" id="glyph1-0"> </symbol> <symbol overflow="visible" id="glyph1-1"> </symbol> <symbol overflow="visible" id="glyph1-2"> </symbol> <symbol overflow="visible" id="glyph1-3"> </symbol> <symbol overflow="visible" id="glyph1-4"> </symbol> <symbol overflow="visible" id="glyph1-5"> </symbol> <symbol overflow="visible" id="glyph1-6"> </symbol> <symbol overflow="visible" id="glyph1-7"> </symbol> <symbol overflow="visible" id="glyph1-8"> </symbol> <symbol overflow="visible" id="glyph1-9"> </symbol> <symbol overflow="visible" id="glyph2-0"> </symbol> <symbol overflow="visible" id="glyph2-1"> </symbol> <symbol overflow="visible" id="glyph2-2"> </symbol> <symbol overflow="visible" id="glyph2-3"> </symbol> <symbol overflow="visible" id="glyph2-4"> </symbol> <symbol overflow="visible" id="glyph2-5"> </symbol> <symbol overflow="visible" id="glyph2-6"> </symbol> <symbol overflow="visible" id="glyph2-7"> </symbol> <symbol overflow="visible" id="glyph2-8"> </symbol> <symbol overflow="visible" id="glyph2-9"> </symbol> <symbol overflow="visible" id="glyph2-10"> </symbol> <symbol overflow="visible" id="glyph2-11"> </symbol> <symbol overflow="visible" id="glyph2-12"> </symbol> <symbol overflow="visible" id="glyph2-13"> </symbol> <symbol overflow="visible" id="glyph2-14"> </symbol> <symbol overflow="visible" id="glyph3-0"> </symbol> <symbol overflow="visible" id="glyph3-1"> </symbol> <symbol overflow="visible" id="glyph3-2"> </symbol> <symbol overflow="visible" id="glyph3-3"> </symbol> <symbol overflow="visible" id="glyph3-4"> </symbol> <symbol overflow="visible" id="glyph3-5"> </symbol> <symbol overflow="visible" id="glyph3-6"> </symbol> <symbol overflow="visible" id="glyph3-7"> </symbol> <symbol overflow="visible" id="glyph3-8"> </symbol> <symbol overflow="visible" id="glyph3-9"> </symbol> <symbol overflow="visible" id="glyph3-10"> </symbol> <symbol overflow="visible" id="glyph3-11"> </symbol> <symbol overflow="visible" id="glyph3-12"> </symbol> <symbol overflow="visible" id="glyph3-13"> </symbol> <symbol overflow="visible" id="glyph3-14"> </symbol> <symbol overflow="visible" id="glyph3-15"> </symbol> <symbol overflow="visible" id="glyph3-16"> </symbol> <symbol overflow="visible" id="glyph3-17"> </symbol> <symbol overflow="visible" id="glyph3-18"> </symbol> <symbol overflow="visible" id="glyph3-19"> </symbol> <symbol overflow="visible" id="glyph3-20"> </symbol> <symbol overflow="visible" id="glyph3-21"> </symbol> <symbol overflow="visible" id="glyph3-22"> </symbol> <symbol overflow="visible" id="glyph3-23"> </symbol> <symbol overflow="visible" id="glyph3-24"> </symbol> <symbol overflow="visible" id="glyph3-25"> </symbol> <symbol overflow="visible" id="glyph3-26"> </symbol>
In [6]:
set.seed(2023)

song_spec <- nearest_neighbor(weight_func = "rectangular", neighbors = 80) |>
    set_engine("kknn") |>
    set_mode("classification")

song_fit <- workflow() |>
    add_recipe(knn_recipe) |>
    add_model(song_spec) |>
    fit(data = song_train)

song_test_predictions <- predict(song_fit, song_test) |>
    bind_cols(song_test) |>
    metrics(truth = genre, estimate = .pred_class) |>
    filter(.metric == "accuracy")
song_test_predictions
A tibble: 1 × 3
.metric.estimator.estimate
<chr><chr><dbl>
accuracymulticlass0.7247514
In [7]:
song_recipe <- recipe(genre ~ energy + danceability, data = song_data) |>
    step_scale(all_predictors()) |>
    step_center(all_predictors())

song_fit_real <- workflow() |>
    add_recipe(song_recipe) |>
    add_model(song_spec) |>
    fit(data = song_data)

new_song_1 <- tibble(energy = 0.29, danceability = 0.56)
new_song_2 <- tibble(energy = 0.889, danceability = 0.628)
new_song_3 <- tibble(energy = 0.84, danceability = 0.75)

new_song_1_predicted <- predict(song_fit_real, new_song_1)
new_song_2_predicted <- predict(song_fit_real, new_song_2)
new_song_3_predicted <- predict(song_fit_real, new_song_3)

new_song_1_predicted
new_song_2_predicted
new_song_3_predicted
A tibble: 1 × 1
.pred_class
<fct>
Emo
A tibble: 1 × 1
.pred_class
<fct>
hardstyle
A tibble: 1 × 1
.pred_class
<fct>
Hiphop
In [10]:
new_songs_predicted_plot <- song_data |>
    ggplot(aes(x = energy, y = danceability)) + 
        geom_point(alpha = 0.4, aes(colour = genre)) +
        xlab("Energy") +
        ylab("Danceability") +
        labs(colour = "Genre") +
        theme(text = element_text(size = 12)) +
        geom_point(aes(x = 0.29, y = 0.56), color = "black", size = 4) +
        geom_point(aes(x = 0.889, y = 0.628), color = "purple", size = 4) +
        geom_point(aes(x = 0.84, y = 0.75), color = "brown", size = 4) +
        ggtitle("Figure 3: Scattorplot of Genres based on Energy and Danceability with New Song Predictions")
options(repr.plot.width = 10, repr.plot.height = 8)
new_songs_predicted_plot
<symbol overflow="visible" id="glyph0-0"> </symbol> <symbol overflow="visible" id="glyph0-1"> </symbol> <symbol overflow="visible" id="glyph0-2"> </symbol> <symbol overflow="visible" id="glyph0-3"> </symbol> <symbol overflow="visible" id="glyph0-4"> </symbol> <symbol overflow="visible" id="glyph0-5"> </symbol> <symbol overflow="visible" id="glyph0-6"> </symbol> <symbol overflow="visible" id="glyph0-7"> </symbol> <symbol overflow="visible" id="glyph0-8"> </symbol> <symbol overflow="visible" id="glyph0-9"> </symbol> <symbol overflow="visible" id="glyph0-10"> </symbol> <symbol overflow="visible" id="glyph0-11"> </symbol> <symbol overflow="visible" id="glyph0-12"> </symbol> <symbol overflow="visible" id="glyph0-13"> </symbol> <symbol overflow="visible" id="glyph0-14"> </symbol> <symbol overflow="visible" id="glyph0-15"> </symbol> <symbol overflow="visible" id="glyph0-16"> </symbol> <symbol overflow="visible" id="glyph0-17"> </symbol> <symbol overflow="visible" id="glyph0-18"> </symbol> <symbol overflow="visible" id="glyph0-19"> </symbol> <symbol overflow="visible" id="glyph0-20"> </symbol> <symbol overflow="visible" id="glyph0-21"> </symbol> <symbol overflow="visible" id="glyph1-0"> </symbol> <symbol overflow="visible" id="glyph1-1"> </symbol> <symbol overflow="visible" id="glyph1-2"> </symbol> <symbol overflow="visible" id="glyph1-3"> </symbol> <symbol overflow="visible" id="glyph1-4"> </symbol> <symbol overflow="visible" id="glyph1-5"> </symbol> <symbol overflow="visible" id="glyph1-6"> </symbol> <symbol overflow="visible" id="glyph1-7"> </symbol> <symbol overflow="visible" id="glyph2-0"> </symbol> <symbol overflow="visible" id="glyph2-1"> </symbol> <symbol overflow="visible" id="glyph2-2"> </symbol> <symbol overflow="visible" id="glyph2-3"> </symbol> <symbol overflow="visible" id="glyph2-4"> </symbol> <symbol overflow="visible" id="glyph2-5"> </symbol> <symbol overflow="visible" id="glyph2-6"> </symbol> <symbol overflow="visible" id="glyph2-7"> </symbol> <symbol overflow="visible" id="glyph2-8"> </symbol> <symbol overflow="visible" id="glyph2-9"> </symbol> <symbol overflow="visible" id="glyph2-10"> </symbol> <symbol overflow="visible" id="glyph3-0"> </symbol> <symbol overflow="visible" id="glyph3-1"> </symbol> <symbol overflow="visible" id="glyph3-2"> </symbol> <symbol overflow="visible" id="glyph3-3"> </symbol> <symbol overflow="visible" id="glyph3-4"> </symbol> <symbol overflow="visible" id="glyph3-5"> </symbol> <symbol overflow="visible" id="glyph3-6"> </symbol> <symbol overflow="visible" id="glyph3-7"> </symbol> <symbol overflow="visible" id="glyph3-8"> </symbol> <symbol overflow="visible" id="glyph3-9"> </symbol> <symbol overflow="visible" id="glyph3-10"> </symbol> <symbol overflow="visible" id="glyph3-11"> </symbol> <symbol overflow="visible" id="glyph3-12"> </symbol> <symbol overflow="visible" id="glyph3-13"> </symbol> <symbol overflow="visible" id="glyph3-14"> </symbol> <symbol overflow="visible" id="glyph3-15"> </symbol> <symbol overflow="visible" id="glyph3-16"> </symbol> <symbol overflow="visible" id="glyph3-17"> </symbol> <symbol overflow="visible" id="glyph3-18"> </symbol> <symbol overflow="visible" id="glyph3-19"> </symbol> <symbol overflow="visible" id="glyph3-20"> </symbol> <symbol overflow="visible" id="glyph3-21"> </symbol> <symbol overflow="visible" id="glyph3-22"> </symbol> <symbol overflow="visible" id="glyph3-23"> </symbol> <symbol overflow="visible" id="glyph3-24"> </symbol> <symbol overflow="visible" id="glyph3-25"> </symbol> <symbol overflow="visible" id="glyph3-26"> </symbol> <symbol overflow="visible" id="glyph3-27"> </symbol> <symbol overflow="visible" id="glyph3-28"> </symbol> <symbol overflow="visible" id="glyph3-29"> </symbol>

Methods¶

Using the “Dataset of songs in Spotify'' dataset, we will be conducting a K-NN classification on specific songs within the dataset to predict their genre. This will be done by specifically using “danceability” and “energy” as the predictor variables and “genre” as the response variable. We will first filter our dataset to only include danceability, energy and genre as the only 3 variables, tidy the data, and further shrink the data by selecting for the 3 genres: Emo, hardstyle and Hiphop. We will then set aside specific observations from the data which our classifier will be predicting the genre for. We will then begin building, tuning, and evaluating our K-NN classification model. This will include dividing the data up into a training set and testing set, using the training set to build and tune our model through cross-validation and evaluating our chosen K value using the testing set. Finally, we will then use this classification model to predict the selected songs we initially set aside and graph the data using a scatterplot. This scatterplot will include the energy variable in the x-axis, danceability in the y-axis, color coding for each of the 3 genres, as well as a different color indicator for the observations we are predicting for.

Expected outcomes and significance¶

What do you expect to find?

  • Correlations between genre and either energy or danceability. For example, we might find that hiphop may have higher danceability scores while hardstyle may have lower danceability scores.
  • Correlations between genre and both energy and danceability. For example, we expect to find that songs with relatively higher danceability and energy are more likely to be hiphop songs while lower danceability and energy scores are more likely to be emo songs.

What impact could such findings have?

  • These findings can improve the music that is recommended to users in music apps. By exploring the user's preference for danceability and energy in music, the app can better recommend more personalized music based on these trends.
  • These findings can help users find music for different occasions. Users can use this information to select the appropriate music or genre for different occasions.

What future questions could this lead to?

  • This project can lead to thinking about how to make a more accurate genre predicting model. We can consider and incorporate more features of music that influences genre to make a more comprehensive and accurate model.
  • This project can also lead us to be curious about other trends with these variables, such as how genres have evolved over time in terms of danceability and energy.

Discussion¶

Summary of what we found

  • We found that the distribution of energy and danceability of songs in the Emo genre is more spread out, with high and low levels, compared to the other two genres. Even though we found a lot of Emo songs have high energy, we rarely see Emo songs with both high energy and danceability. Hardstyle songs are very high in energy and intermediate in danceability. On the other hand, we find hip-hop songs with higher danceability than the other two genres, as well as a relatively high energy.
  • We observed that the Emo genre has a weaker correlation than the other two genres because it does not show a significant concentration in the specific domains of energy and danceability as hip-hop and hardstyle songs do, which might have implications for our analysis of the characteristics of Emo genre songs.
  • In summary, we find that Hardstyle songs have the highest energy and Hip Hop songs have the highest danceability. However, Emo songs have a weak correlation towards energy and danceability. By analyzing and summarizing these characteristics, we can classify songs more easily.

Is this what we expected to find?

  • This is what we are expected to find, as we predicted that for the correlations between genre and either energy or danceability, hip hop may have higher danceability scores, while hardstyle may have relatively lower danceability scores. We can observe these traits from our scatterplot of danceability and energy.
  • We predicted that for the correlations between genre and both energy and danceability, songs with relatively higher danceability and energy are more likely to be hiphop songs while lower danceability and energy scores are more likely to be emo songs. Based on our observation, the areas with higher danceability and energy are mostly hip-hop, while regions with lower danceability and energy are more likely to be Emo songs.

What impact could such findings have?

  • These findings can help people learn about the types of music they are interested in or find new songs based on their musical preferences. Moreover, for larger industry or commercial purposes, it can improve the precision of recommending various genres to users in music apps, thereby improving the overall music recommendation process. This would be very useful as current music recommender systems still produce unsatisfactory recommendations due to the multitude of factors that affect users’ tastes and musical needs (Schedl et al., 2018).
  • As we observed, Emo songs have a weaker correlation than the other two genres because it does not show a significant concentration in the specific domains of energy and danceability as hip-hop and hardstyle songs do, this may have implications for the music recommendation systems, as Emo songs do not have strong correlations for energy and danceability. This may be because “emo” music can span a variety of genres such as punk rock, alternative, or slower acoustic music (Mazzaferro, 2010). What categorizes a song as “emo” may be due to the overemotional lyrics that are sung (Mazzaferro, 2010) rather than its audio features, thus energy and danceability may not be the appropriate variables to consider when dealing with genres that encompass multiple subgenres with widely different audio features. As a result, this is a limitation of our model that should be addressed in the future.
  • Users can find the right music for different occasions more specifically and quickly. They can choose the proper type of music for different occasions by selecting danceability and energy values. These could be potential variables of consideration when creating situation-aware music recommendation systems, which holistically model contextual and environmental aspects of the music consumption process, and incorporate them into the recommendation process (Schedl et al., 2018). For example, this music someone listens to while studying in the library would likely be very different from the music they would listen to while partying with friends.

What future questions could this lead to?

  • How to make a more accurate genre predicting model? We can consider and incorporate more features of music that influences genre to make a more comprehensive and accurate model. For example, incorporating users' emotions into a music prediction system might make it more in tune with users' personal preferences, as music can evoke very strong emotions (Schedl et al., 2018).
  • We can consider and incorporate more audio features of music that influence genre to make a more comprehensive and accurate model. Our accuracy is around 72% which is adequate, but it leaves room for improvement. This is likely due to the model having difficulty classifying songs under the Emo category as it includes many subgenres with widely differing audio features. One way to get around this is to incorporate the evolution of a genre into the model. Unsatisfactory performance of the classifiers may not be due to algorithmic flaws, but rather a change in genre characteristics over time (Nie, 2022). Perhaps by using classifiers that were trained using songs from different year cohorts, they could be able to detect the stylistic shift in the genres of interest (Nie, 2022), thus creating a more accurate classification model.
  • How genres have evolved over time in terms of danceability and energy? Similar to the question above, this question explores stylistic shifts in music audio features over time. This can help our model discern trends in users’ music tastes, which can improve its music recommendation system. For example, the tempo, valence and danceability of songs have increased over time since the late 1940s (Ayeni, 2020), which are useful trends for a classification model to keep in mind as they recommend genres to their users.

Reference¶

Dataset of songs in Spotify. (n.d.). Retrieved December 7, 2023, from https://www.kaggle.com/datasets/mrmorj/dataset-of-songs-in-spotifyhttps://doi.org/10.1109/ICCS45141.2019.9065747

Ignatius Moses Setiadi, D. R., Satriya Rahardwika, D., Rachmawanto, E. H., Atika Sari, C., Irawan, C., Kusumaningrum, D. P., Nuri, & Trusthi, S. L. (2020). Comparison of SVM, KNN, and NB Classifier for Genre Music Classification based on Metadata. 2020 International Seminar on Application for Technology of Information and Communication (iSemantic), 12–16. https://doi.org/10.1109/iSemantic50169.2020.9234199

Khan, F., Tarimer, I., Alwageed, H. S., Karadağ, B. C., Fayaz, M., Abdusalomov, A. B., & Cho, Y.-I. (2022). Effect of Feature Selection on the Accuracy of Music Popularity Classification Using Machine Learning Algorithms. Electronics, 11(21), Article 21. https://doi.org/10.3390/electronics11213518

Mazzaferro, V. P. (2010). Effects of the Emo Music Genre. https://digitalcommons.calpoly.edu/comssp/32

Nie, K. (2022, December). Inaccurate Prediction or Genre Evolution? Rethinking Genre Classification. In Ismir 2022 Hybrid Conference.

Schedl, M., Zamani, H., Chen, C.-W., Deldjoo, Y., & Elahi, M. (2018). Current challenges and visions in music recommender systems research. International Journal of Multimedia Information Retrieval, 7(2), 95–116. https://doi.org/10.1007/s13735-018-0154-2

Taunk, K., De, S., Verma, S., & Swetapadma, A. (2019). A Brief Review of Nearest Neighbor Algorithm for Learning and Classification. 2019 International Conference on Intelligent Computing and Control Systems (ICCS), 1255–1260. https://doi.org/10.1109/ICCS45141.2019.9065747

In [ ]: